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J.  MacQueen 

University  of  California,  Los  Angeles 
1.  Introduction 

Not  infrequently,  the  dec is ion -making  process  goes  something 
like  this:  there  is  first  given  some  sort  of  initiating  problem  or 
goal.  This  leads  to  exploration,  or  search,  for  possible  solutions. 

As  various  possibilities  are  discovered,  they  are  evaluated  tentatively 
and  some  idea  of  their  worth  as  solutions  to  the  initiating  problem  is 
obtained.  On  this  basis,  it  may  be  decided  to  accept  a  certain  possibility 
without  further  consideration,  or  it  may  be  decided  to  evaluate  a  certain 
possibility  more  carefully,  that  is,  to  experiment  or  test,  or,  it  may  be 
decided  to  seek  other  possibilities,  that  is,  to  continue  exploring. 
However,  after  varying  amounts  of  exploration  alternating  with  various 
amounts  of  evaluation,  an  acceptable  possibility  is  eventually  located 
and  the  process  is  stopped,  at  least  as  regards  the  particular  problem 

at  hand.  \^, 

\ 

In  this  paper,  we  present  a  mathematical  model  for  decision 

C.  /  ■'  '■/  */ 

making  as  v±ewed~in-tMs-14ght. i.er,- as-  a  more  or  less  sequential 
process  which  involves  constant  alternation  between  exploratory  and 
evaluative  operations  until  a  satisfactory  possibility  is  located.  We 
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then  describe  the  theoretical  optimal  behavior  in  the  context  of  the 
model.  The  optimal  behavior  is  of  possible  normative  interest,  and  is 
of  use  in  experimental  studies  of  decision  making  where  it  is  desirable 
to  compare  actual  behavior  with  various  kinds  of  theoretical  decision 
rules . 

The  features  of  decision  making  and  behavior  in  general  noted 
above  have  been  described  by  a  number  of  writers.  Among  others, 

E.  C.  Tolman,  John  Dewey,  and  H.  A.  Simon  have  paid  special  attention 
to  the  activity  of  arriving  at  a  course  of  action  through  active  explora¬ 
tion  and  evaluation,  and  more  or  less  formal  models  relevant  to  this  kind 
of  phenomena  have  been  described  by  several  persons,  e.g.,  Stigler  (8), 
Ashby(l),  and  Toda(9),  as  well  as  Simon(T).  The  model  described  here 
supplements  this  work  by  making  possible  explicit  description  of  the 
optimal  balancing  of  exploratory  and  evaluative  activity  within  a  well- 
defined  context.  We  give  up  a  certain  amount  of  the  generality  inherent 
in  some  of  the  above  models  in  order  to  gain  a  measure  of  precision. 

It  is  possible  to  give  many  instances  of  group  and  individual 
behavioral  sequences,  of  the  general  sort  alluded  to  above,  in  which 
decisions  vis-a-vis  exploratory  and  evaluational  activities  play  a  central 
role.  These  decisions  appear  to  permeate  human  experience  and  behavior,  and 
it  seems  likely  that  differences  in  policy  with  respect  to  such  decisions, 
often  operating  in  an  intuitive  and  unconsidered  fashion,  lie  at  the  basis 
of  important  differences  in  individual  and  organizational  performance.  For 
this  reason  alone,  it  seems  worthwhile  to  attempt  to  obtain  an  explicit 
and  thorough  understanding  of  the  problem  under  various  sets  of  limited 
conditions  such  as  are  considered  here. 
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JL  Section  2  below  describes  the  formal  model  and  Section  3 

presents  the  mathematical  results  characterizing  the  optimal  policy. 

The  main  result  is  that  the  solutions  to  certain  given  systems  of 
equations  uniquely  characterize  the  optimal  policy.  The  proofs  of  these 
results,  which  are  not  difficult,  are  given  in  Section  4. 

In  Section  5,  the  model  is  applied  to  the  case  where  it  is 
desired  to  carry  on  many  "search  and  evaluation"  processes  of  the  same 
sort,  subject  to  a  constraint  on  the  total  expenditure  for  conducting 
all  of  them.  This  version  of  the  model  may  be  of  value  in  practical 
selection  problems  where  many  objects  must  be  picked  out.  Section  6 
points  out  some  aspects  of  the  problem  of  obtaining  numerical  solutions 
when  the  distributions  governing  the  search  and  evaluation  process  have 
the  form  of  the  joint  normal. 

2.  The  Model 

Let  (X.j,Y^,V^),  (^2,^2* ^2),  *  *  *  a  se(luence  of  independent, 
real -valued  triples,  with  known,  common  joint  distribution  H.  The  decision 
maker  first  pays  an  amount  Cg  >  0,  called  the  search  cost,  which  gives 
him  an  opportunity  to  choose  a  possibility  whose  worth  is  given  by  V^. 
However,  the  decision  maker  is  told  only  X^,  which,  because  of  the  joint 
distribution  H,  gives  him  some  information  about  V^.  At  this  point  he  may 
either  stop,  taking  V  as  his  reward,  continue,  in  which  case  he  has  a 
chance  at  Vg  and  learns  Xg,  or  take  an  action  referred  to  as  a  test, 
which  costs  cT  >  0  and  enables  him  to  learn  Y^,  thus  gaining  more  infor¬ 
mation  about  V^.  In  the  latter  case,  having  observed  Y^,  again  he  may 
stop,  receiving  V^,  or  continue,  getting  a  chance  at  Vg  and  learning  Xg, 
still  at  a  cost  Cg.  However,  this  entails  permanent  loss  of  the  option  of 


-3- 


taking  V^,  as  does  the  decision  to  continue  directly  after  observing  X^. 
Once  Xg  is  observed,  these  same  possibilities  are  available,  and  so  on. 
Thus,  after  continuing  n  times,  X^  is  known  and  the  deicion  maker  can 
stop,  taking  V  ,  continue  on  to  learn  X^^,  or  test,  observing  Yfl,  and 
then  again  either  stop  with  Vn  or  continue.  Continuing  always  results  in 
loss  of  option.  The  cost*  of  continuing  is  always  cg  and  the  cost  of  a 
test  is  always  c^.  The  problem  is,  when  to  continue,  when  to  test,  and 
when  to  stop,  in  order  to  maximize  the  expected  net  return. 

If  the  opportunity  to  test  is  eliminated,  the  resulting 
problem  is  essentially  identical  to  one  mentioned  illustratively  by 
MacQueen  and  Miller(6)  and  Chow  and  Robbins(3),  except  that  these  writers 
permitted  the  decision  maker  to  return  to  an  earlier  opportunity  if  he 
desired.  However,  the  option  of  returning  to  an  earlier  possibility  is 
never  used,  and  a  little  consideration  shows  that  our  model  is  not  com¬ 
plicated  in  any  important  way  by  permitting  complete  option.  Because 
of  the  independence  and  the  fact  that  H,  cg,  and  c^  are  known  and 
constant,  the  expectation  about  the  future  is  constant.  If  the  optimal 
future  is  good  enough  to  lead  one  to  pass  up  an  opportunity  to  test,  or 
having  tested,  an  opportunity  to  stop,  it  will  always  be  so.  Thus, 
options  on  these  opportunities  will  never  be  used. 

The  analysis  is  based  heavily  on  the  apriori  distribution  of  the 
outcome  of  testing  and  then  stopping  given  the  information  X;  that  is. 


We  can  interpret  the  search  cost  c0  as  including  a  certain  amount  of 
expense  entailed  in  acquiring  the  preliminary  information  X,  as  well 
as  the  cost  of  producing  the  possibility. 
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the  distribution  of  Z  =  E(v|x,Y)  when  X  is  known?  This  distribution, 

which  may  be  obtained  from  H,  is  represented  by  F(z|x)  =  P[Z  <  z|x=x]. 

We  suppose  that  Z  has  a  density  f(z[x)  for  each  x  and  that  x 

itself  has  a  density  f(x).  The  distribution  function  of  X  is  F(x). 

We  assume  that  f(z|x)  and  f(x)  are  positive  for  all  z,  and  all  x, 

00  00 

and  that  E(V)  =  |  |  zf(z|x)dzf(x)dx  exists  and  is  finite.  Further- 
more,  F(z|x)  is  assumed  to  be  differentiable  with  respect  to  x  uni¬ 
formly  in  z . 

The  above  assumptions  are  more  or  less  regularity  assumptions, 
which  enable  certain  mathematical  tools  to  be  applied.  More  important, 
we  will  require  F(z|x)  to  satisfy  the  following  two  conditions: 

i*“ 

Cj,  >/SxF(z|x)  <  o,  and  C2,  E(v|x  *  x)  =  j  zf(z|x)dz  -+  »  as 

«00 

x  -*  +  °°,  and  there  exist  functions  a,  P,  with  ar(x)  -*  ®  as  x  -» * 
and  P(x)  ®  as  x  such  that  for  arbitrary  positive  6^, 

i  =  1,2, 3,1*-,  and  all  x  sufficiently  large, 


(la)  F(a(x)|x)  <  6.^  and 


jtLl* 


)dz  <  62, 


while  for  all  x  sufficiently  small. 


(lb)  i  -  F(P(x)|x)  <  63  and  J^jz jf(z |x)dz  <  6^. 

When  obtains,  we  will  say  F(z|x)  is  stochastically 
ordered  in  x.  For  some  pertinent  remarks  on  this  concept,  see  Karlin 
(3,  p.234)  and  Lehmann(4,p.  73).  The  condition  is  satisfied  for  many 
families  or  distributions,  including  the  normal  and  any  family  in  which 
x  corresponds  to  a  location  parameter;  that  is,  where  F(z|x)  =  G(z-x) 


3 

In  fact,  for  purposes  of  determining  the  optimal  policy,  the  random  variables 
Y  and  V  can  be  dispensed  with  altogether,  and  the  treatment  be  based  on  Z 
and  X  alone,  or,  as  is  done  in  Section  3,  on  Z  and  the  random  variable 
E(V|X).  However,  in  the  psychological  studies  of  decision  making  for  which 
this  model  was  devised,  the  subject  may  be  required  to  learn  the  relation  be¬ 
tween  the  random  variables  or  "cues"  Y  and  X,  and  the  random  outcome  V, 
and  it  is  convenient  to  have  the  model  formulated  in  these  terms. 
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r 

for  some  fixed  distribution  G. 

Condition  Cg  is  a  mathematically  convenient  way  of  insxiring 
that  the  mass  in  P(z|x)  follows  the  mean  in  a  certain  sense.  Roughly 
speaking,  the  condition  implies  that  for  a  given  possibility,  if  the  infor¬ 
mation  X  =  x  is  sufficiently  favorable,  testing  is  not  likely  to  disclose 
that  in  actuality  the  possibility  is  poor,  and  if  the  information  is  suf¬ 
ficiently  unfavorable,  testing  is  not  likely  to  disclose  that  the  possibility 
is  actually  good.  The  mathematical  function  of  this  condition  is  seen 
explicitly  in  the  proof  of  Lemma  2  below. 

We  assume  there  exists  an  optimal  stationary  policy,  that  is,  a 
policy  which  depends  only  on  the  variable  information  available  at  each 
stage,  X  after  continuing,  or  (X,Y)  after  testing,  and  which  in  fact 
achieves  the  least  upper  bound,  assumed  finite,  with  respect  to  the  class 
of  all  policies.  With  this  assumption  in  mind,  the  term  optimal  is  here¬ 
after  used  to  refer  to  optimal  stationary  policies. 

.  When  and  C ^  hold,  and  c^  is  positive,  the  optimal 
policy  takes  a  particularly  simple  form.  For  certain  constants,  v*,x*,y*, 
and  x°,  either,  (i)  continue  for  X  <  x*,  and  test  for  x*  <  X  <  y*, 
and  stop  without  testing  for  X  >  y*,  and  stop  or  continue  after  testing 
depending  on  whether  or  not  Z>  v*,  or  (ii)  never  test  and  either  continue 
or  stop  depending  on  whether  X  <  x°  or  X  >  x°.  A  criterion  is  given  for 
determining  whether  or  not  the  optimal  policy  takes  the  ft>rm  (i)  or  (ii), 
and  equations  characterizing  the  optimal  constants  v*,x*,y*,  and  x°  are 
given. 

In  the  above  and  throughout  the  following,  a  policy  is  described 
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in  the  permissive  sense  by  the  largest  intervals  in  which  each  of  the 
various  decisions  are  allowed  under  the  policy.  Whenever  a  given  value 
of  X,  or  Z,  as  the  case  may  be,  belongs  to  two  such  intervals  either 
of  the  corresponding  decisions  are  permitted.  In  some  cases,  a  value  of 
X  might  belong  to  three  intervals,  in  which  case  all  three  of  the 
decisions  would  be  permitted.  With  this  interpretation  in  mind,  we  can 
say  the  optimal  policy  is  unique,  for  the  various  constants  v*,x*,  etc. 
are  uniquely  determined.. 

Ihe  method  of  analysis  used  here  under  conditions  and  C 2 
may  be  extended  to  the  case  where  is  dropped,  although  some  care 

must  be  used  to  determine  the  form  of  the  optimal  policy.  Because  of 
there  are  essentially  only  five  qualitatively  distinct  kinds  of 
policies:  Never  test,  always  test,  test  for  all  values  of  X  above  some 
point,  test  for  all  values  of  X  below  some  point,  or  test  only  for  values 
of  X  in  an  intermediate  range.  As  will  be  seen  below,  once  the  form  of 
the  optimal  policy  is  known,  explicit  equations  are  easily  written  for  the 
boundaries  of  the  various  intervals.  Use  of  C2  is  one  way  of  narrowing 
these  possibilities  down  to  a  fairly  interesting  case;  i.e.,  where  there 
is  either  testing  in  an  intermediate  range  or  no  testing  at  all. 

Without  C^,  the  situation  becomes  much  more  complicated.  It 
is  now  possible  to  have  F(z|x)  have  as  much  variability  as  desired  for 
any  given  range  of  values  of  x.  Thus,  the  potential  payoff  for  testing, 
which  roughly  speaking  depends  on  this  variability,  can  be  very  high  or 
low  anywhere,  and  the  convenient  structure  described  above  cannot  be 
expected  to  obtain. 
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3.  The  Optimal  Policy- 

Let  v*  be  the  expected  return  from  continuing  and  using  the 

optimal  policy  thereafter  and  consider  the  expressions: 

00 

(2)  T(x,v*)  =>  v*F(v*|x)  *■  [  zf(z(x)dz-c_, 

(3)  R(x)  -  zf(z|x)  =  E(v|x=x) 

The  function  R(x)  gives  the  expected  return  for  stopping  without  test¬ 
ing  when  for  the  possibility  at  hand  X=x.  The  function  T(x,  v*)  gives 
the  expected  return  for  testing  when  X=x  and  proceeding  optimally  there¬ 
after.  This  is  because  after  testing,  which  costs  c^,  with  probability 
F(v*|x),  it  will  turn  out  that  z  <  v*,  and  it  will  be  optimal  to  continue, 

which  achieves  v*,  hence  the  term  v*F(v*|x),  and  if  z  >  v*,  it  will 

00 

be  optimal  to  stop,  which  has  expectation  z,  hence  the  term  zf(z|x)dz. 

Jv* 

If  v*  were  known,  the  optimal  policy  after  testing  would  thus  be  clear, 
and  after  continuing  the  optimal  policy  could  be  determined  from  compar¬ 
ison  of  v*,  R(x)  and  T(x,v*),  which  for  X=x  give  the  returns  for 
the  three  possible  decisions  on  the  assumption  that  an  optimal  policy  is 
to  be  pursued  in  the  future.  Selection  of  the  better  from  among  these 
decisions  would  thus  determine  the  optimal  policy,  (We  are  employing  the 
principle  of  optimality  (2)  here.)  It  remains,  then,  to  determine  v* 
and  show  that  the  optimal  policy  has  the  form  described  in  Section  2. 

Under  and  C^,  R(x)  is  monotone  in  x  and  has  the  domain 

Consequently,  we  can  work  with  the  entirely  equivalent  random 
variable  R  which  has  value  R(x)  when  X=x.  This  is  convenient  since 
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then  E(v|R=r)  =  r.  Define  x(r)  "by  the  relation  r  =  R(x(r))  and  let 
Fx(r)  =  F(x(r))  =  P[R  <  r]  and  let  F^zjr)  =  F(z|x(r))  =  P[Z  <  z|R=r]. 
These  distributions  have  corresponding  positive  densities  f^(r)  and 
f^(z  j  r) .  Conditions  and  Cg  carry  over  to  the  new  variable  R. 

Thus  if  C.^  holds,  S/^^zjr)  <  0  and  if  Cg  holds  in  terms  of  x, 
it  will  also  hold  in  terms  of  r  with  x  replaced  by  x(r)  in  or(x) 
and  P(x),  and  f(z[x)  replaced  by  f-Jzjr).  In  these  terms,  the 
expected  return  for  testing  when  R=r  and  using  the  optimal  policy 
thereafter,  becomes 

(3a)  T1(r,v*)  =  v^F^v^lr)  +  J*"  zf^zlrjdz-c^. 

Clearly,  T^(r,v)  is  continuous  and  differentiable  in  both  r  and  v. 

We  will  make  use  of  several  easy  lemmas  whose  proofs  are  given 
in  Section  4.  The  main  point  of  these  is  to  yield  propositions  I  and  II 
below. 

Lemma  1.  Under  the  stochastic  ordering  conditions  C^,  for  every  fixed 
v,  T-L(r,v)  is  monotone  in  r  with  0  <  dT.j/dr  <  1,  and  hence  for  fixed 
v  there  is  at  most  one  solution  to  each  of  the  equations , 

(4)  Tx(r,v)  =  v, 

(5)  T1(s,v)  =  s, 

and  for  r*  satisfying  (4),  T^(r,v)  <  v  for  r  <  r*,  and  for  s* 

satisfying  (5),  T1(r,v)  >  r  for  r  <  s*. 

Lemma  2.  Under  Cg  and  with  c^,  >  0,  each  of  the  equations  (4)  and 
(5)  in  fact  has  at  least  one  finite  solution  for  every  v. 

Using  Lemmas  1  and  2,  and  by  inspection  of  Figure  1,  we  obtain 


the  following: 
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Figure  1 


Lemma  3.  Under  and  Cg  and  with  c^  >  0,  either  (i),  there  is  an 

interval  [r*,s*]  (r*  <  s*)  such  that  continuing  is  optimal  given 
R  =  r  <  r*,  testing  is  optimal  given  R  =  re[r*,s*]>  and  stopping  with¬ 
out  testing  is  optimal  given  R=r  >  s*,  or  else  (il),  there  is  no  Buch 
interval  and  for  some  point  r°(=v*)>  continuing  is  optimal  given  R-r  <  r°, 
and  stopping  without  testing  is  optimal  given  R=r  >  r°. 

With  Lemma  3  in  mind,  we  easily  derive  the  following: 

Proposition  I.  Suppose  that  condition  (l)  of  Lemma  3  obtains .  Then  v*> 
together  with  r*  and  s*,  must  satisfy  the  system  of  equations  6, 

roo 

T1(t,v)f1(t)dt  +  J  tfx(t)dt  -  cs, 

r  s 

(7)  Tx(r,v)  =  v, 

(8)  T^sjv)  =  s, 

together  with  the  auxiliary  condition  r  <  s.  Equation  (6)  merely 
equates  the  expected  return  from  continuing  under  any  policy  of  the  type 
described  under  (i),  expressed  in  two  different  ways.  Equation  (7)  is 
necessary  for  the  optimal  point  r*  in  as  much  as  from  the  continuity 
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of  T^,  and  lemmas  1  and  2,  there  is  a  unique  point  where  T^(r,v*)  =  v*. 
Similarly  for  equation  (8).  We  note  that  if  (i)  fails,  this  system  of 
equations  cannot  he  satisfied  hy  the  optimal  expected  return  v*  and  is 
meaningless. 

A  similar  argument  gives  the  following: 

Proposition  II.  Under  ( ii ) ,  v*  and  r°  must  satisfy  the  equations 

00 

(9)  v  *  vF(r)  o  J  tf^tJdt-Cg 

and 

(10)  v  =  r; 

that  is,  v*(or  r°)  must  satisfy 

00 

(9a)  v  =  vF^vH  J  t^tJdt-Cg. 

We  are  thus  in  a  position  to  determine  v*  and  the  optimal  policy 
as  well,  except  for  two  things,  the  possibility  of  non-uniqueness  of  the 
solutions  to  the  above  systems  of  equations  and  the  matter  of  knowing 
whether  or  not  (i)  or  (ii)  obtains,  theorems  1,  2,  and  3  settle  these 
questions.  Hie  proofs  of  these  theorems  are  given  in  Section  1+  along  with 
the  proof*  of  the  above  lemmas . 

theorem  1.  Under  the  stochastic  ordering  condition  there  is  at  most 
a  single  triple  (v*, r*, s*)  which  s Imultaneously  satisfies  the  system  of 
equations  ©  with  r  <  s. 

theorem  2.  Equation  (9a)  has  exactly  one  solution. 

We  note  that  regardless  of  whether  or  not  (i)  or  (ii)  obtains, 
theorem  2  insures  that  equation  (9a)  characterizes  the  optimal  expected 
return  v°  in  the  class  of  policies  in  which  testing  is  never  permitted. 
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Theorem  3.  Under  and  and  with  c^,  >  0,  condition  (i)  of 
Lemma  3  holds  if  and  only  if  T^(v°,v°)  >  v°  where  v°  is  the  unique 
solution  to  equation  (9a). 

Theorem  3  provides  that  testing  is  optimal  for  some  value  of 
r  if  and  only  if  at  r  =v°  it  is  possible  to  do  just  as  well  as  with 
the  best  policy,  say  R°,  in  the  class  which  never  permit  testing,  simply 
by  testing  once  and  using  R°  thereafter,  for  the  best  policy  in  this 
class  is  in  fact  obtained  from  (9a),  as  we  have  seen. 

To  obtain  the  optimal  policy  in  terms  of  the  variables  X  and 
Y,  x*,y*,  and  x°  may  be  computed  from  r*,  s*,  and  r°,  respectively, 
using  the  transformation  x(r).  Alternatively,  the  system  of  equations 
6  and  (9)  and  (10),  can  be  formulated  and  solved  in  terms  of  the  vari¬ 
ables  X  and  Y  directly,  since  the  one-to-one  character  of  the 
transformation  x(r)  insures  that  the  various  uniqueness  results  given 
above  will  carry  over,  as  will  the  test  for  the  form  of  the  optimal 
policy  offered  by  Theorem  3. 


4.  Proofs 

Hie  function  (3a)  can  be  written  in  either  of  the  forms 
Tl(r>v)  -  v  +  I” (l-F^(z|r)dz  -  cT 


v 

„V 


(11) 

or 

(12)  T^Tiv)  a  r  +  J  I,1(z|r)dz  -  cT 
These  formulae  may  be  verified  by  integrating  by  parts.  Thus,  for  (ll) 

00  CO 

we  find  v+  z(l  -  F  (z|r))|  +  f  zf  (zjr)dz  -  c^.  To  evaluate 

v  s° 

^iI?0z(l  -  F  (z|r))  we  use  z(l  -  F^zjr))  <  J  zf^zjrjdx  -*  0  as  z-»  » 
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oo 

(since  J  xf^(x|r)dx  exists  and  is  finite.)  Equation  (12)  is  verified 

-00  Y  OQ 

in  a  similar  manner  using  the  fact  that  r  =  J*  zf^(z|r)dz  +  J  zf^(z|r)dz. 

-  -XQ  Y 

Lemma  1.  To  prove  Lemma  1,  we  find  from  (ll)  that 
00 

dT./d r  =  -  d/drF..(z|r)dz  >  0,  since  under  C  ,  ( z | r- )  <  0,  and 

Jv  „v 

from  (12)  that  dT^/dr  =  1+1  d/drF^(z|r)dz  <  1,  for  the  same  reason. 
Thus,  0  <  dT^/dr  <1,  as  was  to  be  shown. 

Lemma  2.  To  prove  Lemma  2,  consider  first  equation  (4).  We  need  to  Bhow 

that  under  C^,  and  if  c^,  >  0,  there  is  a  solution  to  T^(r,v) .  =■  v 

for  every  fixed  v.  Inspection  of  (12)  shows  that  for  large  positive 

values  of  r,  T^(r,v)  will  exceed  v.  For  large  negative  values  of  r, 

T^(r, v)  will  be  below  v.  To  see  this,  we  choose  the  function  (3  in 

C2  corresponding  to  6^  satisfying  Iv^^c,^  and  6^  <  c^/2,  and 

apply  (lb)  with  r  such  that  P°  =  0(x(r))  <  v.  Bien 

-00 

Tx(r,v)  =  v  +  v(F1(v|r)-l)  +  J  zf^zlrjdz  - 

<  v  +  jvj.|F1(e°|r)-l|  +  JP°o | z | f j^( z I r )dz  -  cT, 


<  v  +  Ct/2  +  Ct/2  -  cT  =  v. 

From  the  continuity  of  T^  there  must  be  an  intermediate  value  of  r 
for  which  T1(r,v)  =  v. 

Similarly,  for  equation  (5),T^(s,v)  >  v-c^,  >  s  for  large 
negative  values  of  s.  To  show  that  T^(s,v)  <  s  for  large  positive 
values  of  s,  the  function  a  is  chosen  corresponding  to  6^,  6^  >  0 
for  which  |v|  6^  <  c^J 2  and  6^  <  c^/2,  and  then 
that  a0  =  of(x(s))  >  v.  Applying  (la), 


s  is  selected  so 
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T-jUjv)  =  vF^vIs)  +  s  -  J  zf1(z|s)dz  -  cT 

•iX) 

<  M|f1(u°|s)|  +  s  +  J*  Ulf^alsjaz  -  cT 


i  V2  *  ‘  *  V2  "  °T  - s- 


Theorem  1.  Tto  prove  theorem  1,  we  note  that  by  virtue  of  Lemma  1,  it  is 

only  necessary  to  show  that  the  system  of  equations  (6),  (7)  and.  (8), 

subject  to  r  <  a,  has  a  single  solution  in  v,  since  then  by  this 

lemma  r  and  s  are  uniquely  determined.  Let 

<p(v)  =  v  -  [vF^r)  +  JST1(t,v)f1(t)dt  +  J^tf^tjdt  -  cg] 

r  s 

so  that  cp(v)  =  0  is  equivalent  to  (6).  Differentiating  tp  subject  to 
(7)  and  (8)  we  obtain 

<P'(v)  =  1  -  [F^r)  +  vf^rjr'  +  T^s^f^sV 

-  T^rjvJf-^rV  +  ^ a/^vT-L(t/v)J.1(t)dt 
r 

-Bf^sjs'], 


and  on  using  (7)  and  (8), 

cp'(v)  =  1  -  [F^(r)  +  d/dvT^t^vJf^tJdt]* 

r  X 

From  (12)  dT^/dv  ■  F(v|r),  so  that 


cp'(v)  »  1  -  iF^(r)  +  j  F(v|t)f1(t)dt]. 

But  since  r  <  s,  0  <  JSF1(vlt)f1(t)dt  <  F^s)  -  F^r). 


Thus, 


0  <  1  -  Fx(s)  <  <p'(v)  <  1  -  F^(r)  <  1 
and  cp(v)  =0  can  only  have  one  root. 


Theorem  2.  We  may  prove  Theorem  2  in  a  similar  way,  by  differentiating 

00 

Y(v)  =  v  -  [vF^v)  +  J  zf1(z)dz  -  Cg].  Ibis  gives  Y*(v)  =  1  -  F-^v). 

Hence  0  <  Y'(v)  <  1,  and  Y(v)  =  0  likewise  only  has  one  root.  That 

there  is  a  root  may  be  seen  on  integrating  by  part3  as  in  (ll)  above. 

00 

Thus,  Y(v)  =  -  J  (l-F^(x)  )dx  -  cg  and  Y(v)  -*  -  »  as  v  and 

v 

Y(v)  -»Cq>0  as  v  -*  +  M,  so  that  Y(v)  =  0  for  some  intermediate 
value . 

Theorem  3.  To  prove  Theorem  3,  consider  first  the  sufficiency  part.  We 
have  to  show  that  if  T^(v°,v°)  >  v°  where  v°  satisfies 

CO 

v°  =  v°F^( v° )  +  J  Qzf1(z)dz  -  Cg,  then  there  is  at  least  one  point  r 
such  that  if  R=r  the  optimal  policy  permits  testing.  Clearly  r  =  r°  »  v° 
is  such  a  point  since  testing  at  R«*r°  and  then  using  the  best  policy 
without  testing  yields  T(v°,v°)  and  does  at  least  as  well  as  the  latter 
policy  which  by  Theorem  2  achieves  exactly  v°. 

Now  we  show  that  if  there  is  a  point  r  such  that  if  R=r  testing 
is  optimal,  then  T^(v°,v°)  >  v°.  Let  v*  >  v°  be  the  optimal  expected 
return.  Since  max  (v*,r)  can  be  achieved  by  using  the  best  of  the  two 
choices,  continuing  optimally  or  stopping,  the  hypothesis  is  that  for 
some  r,  T^(r,v*)  >  max  (v*,r).  Since  0  <  dT^/cir  <  1,  this  means  that 
T^(v*,v*)  -  v*  >  T^(r,v*)  -  v*  >  0  so  that  T^(v*,v*)  >  v*. 

r° 

Now  let  1^{r,v;c)  =  v£^v|r)  +  J  zf^zlr)  -  c;  i.e.,  T1(r,v;c) 

is  T^(r,v)  with  c^,  =  c  indicated  explicitely.  Obviously,  T1(r,v;c) 

is  strictly  decreasing  in  c.  Let  (vc>rc»sc)  solution  of  (3 

when  =  c.  We  have  then,  by  the  above  remark,  that  Ti(vc>vc;c)  >  vc 

for  v  =  v*.  Let  c  increase  from  the  given  value  c_.  Clearly, 
c  J. 
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Ti(vc,vc;c)  decreases  with  c,  as  does  v  ,  but  from  continuity 
considerations  it  is  easily  shown  that  at  some  point,  say  c°  >  c^, 
we  will  have 

{Ik)  Tx(v  ,v  0;c°)  -  v  . 

c  c  c 


This  means  v  =  r  =  s  and  these  are  all  equal  to  v  ,  for  on 
o  o  o  ^ 

c  c  c 

substituting  vo=r0=s0  i-n  (6),  (?)  and  (8)  (read  T-(r,v;c) 
c  c  c 

instead  of  T^(r,  v)),  (6)  reduces  to  (9a),  hence  is  satisfied  by 

v  =  v°,  and  (t)  and  (8)  are  satisfied,  by  application  of  (l4). 
c 

Moreover,  this  solution,  v  =  r  =  s  =  v  ,  is  unique  by  Uieorem  1. 

0  0°  C°  c 

Urns  we  have  T^(v°,v° jc  )  =  v°.  But  c^  <  c°  hence  T^(v°,v'  ;c^,)  >  v° 
as  was  to  be  shown. 


5.  Selecting  Many  Possibilities 
In  this  section,  we  will  mention  two  possible  applications  of 
the  model  to  related  problems  in  which  there  is  a  cost  constraint,  and 
many  possibilities  are  to  be  selected  rather  than  just  a  single  possibility 
as  in  Section  2.  Concrete  situations  in  which  these  problems  might  arise 
are  to  be  found  in  the  area  of  personnel  selection  and  in  connection  with 
various  biological  selection  problems,  for  example,  drug  screening. 

Problem  1.  Prom  a  very  large  population  of  possibilities,  it 
is  desired  to  select  a  fixed  number  N  using  a  two  stage  testing 
procedure.  The  first  test  of  a  given  possibility  costs  c°.  In  order  to 
apply  the  model,  it  is  assumed  the  first  test  must  be  applied  to  each 
possibility.  If  the  second  test  is  used,  it  costs  c°.  A  measure  of 


"true"  worth  W  for  the  possibilities  has  a  known  (apriori)  distribution 
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in  the  population  of  possibilities,  and  the  regression  of  W  on  the 
outcome  of  the  first  test,  and  on  the  outcome  of  the  first  and  second 
test  combined,  as  well  as  the  joint  distribution  of  these  regressions, 
is  known.  The  possibilities  are  taken  at  random  from  the  population 
and  considered  for  selection  one  after  another.  The  first  test  is  applied 
to  each  possibility,  and  after  the  outcome  of  the  first  test  is  known,  the 
possibility  is  either  selected,  rejected,  or  given  the  second  test  and 
then  either  selected  or  rejected.  The  process  is  to  be  stopped  a8  Boon 
as  N  possibilities  are  selected.  The  problem  is  to  select  the  N 
possibilities  in  such  a  manner  as  to  make  their  expected  total  worth  as 
high  as  possible,  subject  to  the  constraint  that  the  total  testing  cost 
does  not  exceed  a  given  testing  budget  C. 

We  cannot  solve  this  problem  exactly.  However,  ve  can  determine 
the  policy  which  does  well  in  items  of  the  expected  worth  of  the  N  pos¬ 
sibilities  selected,  subject  to  the  constraint  that  the  expected  total 
testing  cost  is  at  most  C..  This  policy  has  the  advantage  of  being  fixed; 
that  is,  the  same  procedure  is  applied  to  each  possibility,  and  if  N  is 
large,  say,  forty  or  more,  the  actual  cost  of  the  policy  will  with  high 
probability  not  deviate  from  C  by  more  than  a  small  percentage  amount 
and  the  constraint  will  be  approximately  met.  We  need  to  assume  that 
C  >  Nc°  in  order  for  the  approximate  solution  to  make  sense. 

Cohsider  the  problem  of  selecting  a  single  object  as  described 

in  Section  2,  identifying  X  with  the  outcome  of  the  first  test,  Y  with 

the  outcome  of  the  first  and  second  test  combined,  and  V  with  the 

measure  of  worth  W.  However,  instead  of  identifying  the  cost  c0  with 

o 
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c°  and  c^,  with  c°  directly,  we  introduce  a  multiplier  X  and  with 
hypothetical  costs  ca  =  Xc°  and  c  =  Xc°,  solve  the  optimization 
problem  which  consists  of  selecting  a  single  possibility  in  such  a  manner 
as  to  maximize  expected  net  return.  This  gives  rise  to  an  optimal  policy 
depending  on  X,  say  r(x).  Associated  with  this  jiolicy  will  be  its 
expected  actual  cost,  say  Ec(x)  and  the  expected  worth  of  the  single 
possibility  selected,  say  Ev(x).  We  now  determine  X  so  that  Ec(x)  =  C/N 
and  select  N  possibilities  using  this  policy  repeatedly.  It  is  assumed 
that  such  a  X  exists  and  is  positive.  The  expected  total  cost  will  then 
be  C.  The  actual  total  cost  in  a  specific  instance  will  be  the  sum  of  N 
independent,  identically  distributed  random  variables,  and  we  are  assured 
by  the  strong  law  of  large  numbers  that  the  ratio  of  the  total  cost  to  the 
expected  cost  approaches  unity  as  N  increases. 

Now  let  Ev1  and  Ec^  i=l,2, ...N,  be  the  expected  return  and 
expected  cost,  respectively,  for  selecting  the  ith  possibility  using  any 
other  procedure.  We  have 
(15)  Ev(x)  -  XEc(x)  >  Evt  -  XEc^ 

N 

by  the  optimality  of  r(x)>  and  if  j5-.Ec  <  C,  and  X  >  0,  we  have 

N  1-11 

NEv(x)  >  iE1Evi  as  required. 

Problem  2.  Suppose  the  situation  is  as  in  problem  1,  with  two 
stages  of  testing  possible  at  costs  c°  >  0  and  c°  >  0,  respectively, 
for  the  first  and  second  tests.  However,  the  requirement  is  to  make  the 
total  expected  worth  of  the  possibilities  selected  as  large  as  possible 
given  a  fixed  budget,  the  number  actually  selected  being  unrestricted. 

Thus,  the  outcome  will  be  random  number  N  of  random  worths  corresponding 
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to  the  possibilities  selected.  For  large  C,  we  can  determine  a 

"fixed"  policy  (see  above)  whose  total  cost  is  approximately  C  and 

does  approximately  as  well  as  any  other  fixed  policy. 

As  in  problem  1,  we  first  determine  the  optimal  policy  for  a 

single  search  and  evaluation  problem  with  hypothetical  costs  cg  -  \c° 

and  c^  =  \c°,  but  now  X  is  chosen  so  that  Ev(x)  -  XEc(x)  =0, 

where,  as  in  problem  1,  Ev(x)  is  the  expected  value' of  the  single 

possibility  selected  using  the  optimal  policy  r(x)  for  that  value  of 

X,  and  Ec(x)  is  the  expected  actual  cost  using  this  policy.  Thus, 

in  the  hypothetical  problem,  X  is  to  be  chosen  so  that  v*  of  Section  2 

is  zero.  Since  the  model  assumes  positive  costs,  we  suppose  there  is  a 

positive  X  which  has  this  property. 

The  proposed  policy  consists  of  using  r(x)  over  and  over  again 

until  the  testing  budget  C  is  exhausted,  it  being  permitted,  however,  to 

use  r(x)  to  complete  selection  of  the  last  possibility  even  if  the  budget 

is  exceeded  while  this  is  being  done,  but,  of  course,  no  new  selections  are 

started.  Let  r'  be  a  given  policy  for  selecting  a  single  possibility  and 

suppose  the  expected  cost  Ec'  of  the  use  of  the  policy  is  less  than  a 

2 

given  constant  c*  and  the  variance  of  the  cost,  a  c ' ,  is  less  than  a 
2 

constant  o  .  Consider  the  class  of  policies  formed  by  repeated  use  of 

2  2 

any  such  policy  r',  with  Ec'  <  c*  and  a  c'  <  a  ,  until  the  budget  is 
exhausted  in  the  above  sense,  that  is,  with  selection  of  the  possibility 
underway  at  the  time  the  budget  is  exceeded  being  completed  using  r' .  The 
proposed  policy  is  approximately  optimal  in  this  class,  in  that  for  C 
sufficiently  large,  the  proposed  policy  is  almost  certain  to  achieve  a 
high  proportion  of  the  return  of  any  element  of  the  class.  The  proof  of 
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this  will  be  outlined,  a  number  of  technical  details  being  omitted. 

For  a  given  policy  r*  with  Ec'  <  c*  for  selecting  a 
single  possibility,  let  Ev'  be  the  expected  worth  of  the  possibility 
selected  under  this  policy.  We  have,  then,  by  the  optimality  of  r(x) 
in  the  hypothetical  problem  and  by  the  choice  of  X,  that 
(l6)  0  =  Ev(x)  -  XEc(x)  >  Ev'  -  XEc’. 

If  C  is  large  relative  to  Ec(x),  repeated  use  of  r(x)  until  the 
budget  is  exhausted,  implies,  by  virtue  of  the  law  of  large  numbers, 
the  number  N  of  selections  actually  made  will  be  large.  Thus,  we  will 
have  C  a:  c°  +  c°  +  ...  c°  a  NEc(x),  where  c°  ...  c°  are  the  random 
total  costs  incurred  in  making  the  N  selections;  more  precisely,  it  can 
be  shown  that  for  some  6^  whose  absolute  value  is  small  relative  to  C 
with  high  probability,  we  will  have  NEc(x)  =  C  +  61«  Similarly,  exhaust¬ 
ing  C  by  repeated  use  of  any  other  policy  r*  means  that  for  C 
sufficiently  large,  the  corresponding  number  N'  of  selections  will  be 
large  with  high  probability  —  provided  the  policy  has  finite  expected  cost 
for  selection  a  single  possibility  so  that  the  law  of  large  numbers  may  be 
applied.  In  fact,  we  will  have  N'Ec'  =  C  +  6^  for  some  62  which  with 
high  probability  will  have  small  absolute  value  relative  to  C  for  all  r' 
such  that  Ec'  <  c*  and  a^c'  <  a^.  Now,  N'(Ev’  -  XEc')  <0  by  (l6), 
so  N'Ev'  <  XN'Ec '  =  X(0t62).  Again,  by  (l6),  NEv(x)  =  XNEc(x)  =  xtCH^), 


The  interested  reader  can  find  the  essential  technical  details  worked 
out  in  Western  Management  Science  Institute  Working  Paper  No.  4,  "Sequences 
of  time  variable  games,"  available  on  request  from  the  author. 
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so  that  N'Ev'  <  NEv(x)  —  X6^  +  X6g.  The  order  of  magnitude  of  the  quantity 
-XS^  +  X6g  is  independent  of  N  and  the  ratio  of  NEv(x)  to  N'Ev'  will 
in  fact  .converge  almost  surely  to  a  number  a  <  1  as  C 
6.  Computation  of  the  Optimal  Policy 
The  writer  has  been  unable  to  find  any  interesting  case  where 
solutions  to  the  system  6  can  be  obtained  in  terms  of  elementary  func¬ 
tions.  Of  course,  the  equations  may  be  solved  by  numerical  procedures  in 

a  specific  instance.  Some  remarks  on  the  computational  problem  in  the 

5 

case  of  the  normal  distribution  are  perhaps  in  order: 

If  X,Y,  and  V  in  Section  2  have  a  joint  normal  distribution, 
it  turns  out  that  after  making  the  obvious  linear  transformations,  there 
are  only  three  essential  parameters  in  the  model,  of  which  two  are  the 
costs  cs  and  c^.  As  was  pointed  out  in  Section  2,  the  distribution  of 
X  and  the  conditional  distribution  of  the  random  variable  Z  =  E(v|x,Y), 
are  all  that  count.  These  have  a  joint  normal  distribution  which  involves 
five  parameters,  the  means  and  variances  of  X  and  Z  and,  say,  their 
correlation,  p.  Choice  of  scale  and  origin  for  X  and  Z  (or  V),  which 
are  arbitrary  in  the  model,  eliminates  the  means  and  variances.  For 

6 

practical  purposes,  then,  the  computational  problem  reduces  to  tabling 


^Relevant  tables  as  indicated  below  are  being  prepared  for  the  case  of  the 
normal  distribution. 

c 

Using  such  tables,  the  problem  of  locating  the  multiplier  X  referred  to 
in  problems  1  and  2  of  Section  5  is  easily  solved  by  trial  and  error, 
entering  the  table  with  Cg  =  Xcg  and  cT  =  Xc§  until  a  value  of  X  is 
located  such  that  the  optimal  expected  cost  (problem  l)  or  the  optimal 
expected  net  return  (problem  2)  has  the  required  property. 
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the  optimal  constants  v*,x*,y*,  and  v°,  and  the  expected  costs  wider 
the  optimal  policy,  as  a  function  of  and  P*  course,  a  number 

of  other  parameterizations  are  possible. 

The  joint  normal  distribution  for  X,Y,  and  V  arises  in  the 
situation  where  V  has  a  known  (apriori)  normal  distribution  and  the 
decision  maker  is  allowed  to  observe  as  the  outcome  of  search  a  variable 
5  equal  to  V  plus  an  error  independent  of  V,  while  if  he  tests  he  is 
allowed  to  observe  a  variable  T]  equal  to  V  plus  another  error  inde¬ 
pendent  of  V,  the  two  errors  leaving  a  known  joint  normal  distribution. 

Here  we  may  identify  §  with  X  and  T]  with  Y. 

This  situation  may  be  interpreted  by  saying  the  decision  maker 
learns  about  each  V  through  a  noisey  channel,  having  the  option  to  learn 
more,  at  a  cost,  by  using  another  noisey  channel. 

Another  similar  case  giving  rise  to  a  joint  normal  distribution 
is  where  V  is  the  mean  of  a  normal  population,  the  population  being 
selected  from  a  family  of  normal  populations,  all  with  the  same  variance, 
in  such  a  manner  that  V  has  itself  a  normal  distribution.  When  a 
population  is  selected  for  consideration,  a  sample  of  n^  independent 
observations  on  the  population  is  first  made.  Further  testing,  if  carried 
out,  results  in  another  sample  of  n^  independent  observations.  Here  X 
may  be  taken  to  be  the  mean  of  the  first  sample  and  Y  to  be  the  mean  of  the 
second  sample. 
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